Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9611 / 000108_owner-urn-ietf _Thu Nov 7 16:15:17 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 11KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id QAA03181 for urn-ietf-out; Thu, 7 Nov 1996 16:15:17 -0500 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id QAA03176 for <urn-ietf@services.bunyip.com>; Thu, 7 Nov 1996 16:15:13 -0500 Received: from IG.CS.UTK.EDU by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA25756 (mail destined for urn-ietf@services.bunyip.com); Thu, 7 Nov 96 16:14:43 -0500 Received: from localhost by ig.cs.utk.edu with SMTP (cf v2.11c-UTK) id QAA20758; Thu, 7 Nov 1996 16:09:28 -0500 (EST) Message-Id: <199611072109.QAA20758@ig.cs.utk.edu> X-Mailer: exmh version 1.6.7 5/3/96 X-Uri: http://www.cs.utk.edu/~moore/ From: Keith Moore <moore@cs.utk.edu> To: Martin J Duerst <mduerst@ifi.unizh.ch> Cc: moore@cs.utk.edu, tallen@fsc.fujitsu.com, urn-ietf@bunyip.com Subject: Re: [URN] I18N does not belong in URNs In-Reply-To: Your message of "Thu, 07 Nov 1996 18:15:34 +0100." <"josef.ifi..235:07.10.96.17.15.36"@ifi.unizh.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 07 Nov 1996 16:09:28 -0500 Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Keith Moore <moore@cs.utk.edu> Errors-To: owner-urn-ietf@bunyip.com > >> So for grandfathering, we have two choices. > >> > >> 1) We interpret it as "direct grandfathering of characters", > >> in which case we have to allow a really wide range > >> of characters (i.e. ISO 10646). > >> 2) We interpret it as "indirect grandfathering", in which > >> case the digits, or whatever small set, will be > >> enough. > > > >I suspect there will be some of each. > > I wouldn't mind "some of each", if this means "about the same > amount for everybody on the world". If it comes to mean "a lot > for English-speaking, but very little for the rest of the world", > I would have to strongly disagree. Martin, Please accept that I agree with you here. I don't want Internet information access protocols to be significantly biased in favor of English speakers (or anybody else). I say "please accept" because if you accept that we substantially agree on the above, it's a lot easier to understand the rest of what I'm saying. For the case of URNs, I have no problem with the idea that they should not be friendly to English speakers -- because they really shouldn't be freindly to anybody, except to be transcribable! On the other hand, we can hardly prohibit some rare and occasional misuse of URNs such that human-friendly meaning creeps in. And we've long since established the need to grandfather in existing name spaces that have the characteristics of URNs. Some of those name spaces are going to have some minimal human-friendliness, some degree of meaning exposed in the names. But there won't be much human-friendliness in any name space that gets grandfathered in, as any naming scheme which tries to embed lots of meaning in the names will inherently have poor persistence and thus be unsuitable for URNs. Still, it's a judgement call as to whether a particular name space fits in URN space. In making such a judgement, what matters is the overall characteristics of the name space, not which characters are and aren't allowed. > >> Choosing ASCII only as for URLs would be very unfair cheating. > > > >If the names aren't human meaningful, I don't see what you're > >complaining about. > > If I were sure the names would not be meaningful, I would not > have much reason to complain. But the URL example shows that > restricting people from becomming meaningful is very hard if > not impossible. A lot of people "publish" things on the web by simply editing files in a particular directory; the URLs for those files are derived from their filenames. We can hardly expect people to assign meaningless names to files that they edit, and few people (outside the information sciences world) understand the virtue in using meaningless resource names. I agree that it may be difficult to get most people to understand that URNs are not human-friendly names (since it's been difficult to do even within this WG). This is part of why I think the URN: prefix is important (so that URNs are easily distinguished from URLs). It's also important to promote early use of the URN: prefix for existing non-friendly name spaces (so people get used to seeing URN: followed by gibberish). Finally, new URN spaces are going to require some restrictions about how those URNs are defined. (In both RCDS and the CNRI handle system, resource names aren't chosen by users -- you ask the system to create one and it gives it back to you.) Despite all of this, I'm sure that a few human-friendly URNs will creep in somewhere. But what's important is that URNs won't be generally useful as a means to define human-friendly names -- the case where a URN is human-friendly will be the odd exception rather than the rule. (I'm reminded of RFC numbers which are generally assigned in sequence, but occasionally are chosen to be easily remembered -- so RFC XX00 is often an "assigned numbers" RFC. My favorite example is RFC 1984.) > You may agree with the above point, or you may disagree. But > either way, ASCII only is unfair. If you think that naming > schemes, protocols, standards, some review board, or whatever, > can assure that no meaningful stuff is created, then there > is no reason to restrict to ASCII. The working group needs to decide which is more important -- being able to grandfather in existing URN-like name spaces from everywhere on the planet, or having all URNs be transcribable by anyone on the planet. I could certainly make a case for a subset of ASCII-only based on the latter consideration. Perhaps the right thing to do is to define a mapping to be used when grandfathering existing non-ASCII namespaces to URNs, i.e. to encode the non-ASCII characters as UTF-8 and then encode each octet not printable in ASCII using %XX notation. That way, people could still type in resource names like they're accustomed to doing, but the program they type them in to would convert them to canonical URN format (by prepending URN:namespace: and doing the UTF-8 and %XX encoding). For purposes of resolution, URNs would always be transmitted in canonical format, but programs could decode the %XX and UTF-8 for the purposes of display to humans. > One should assume that Japanese, Russians, Chinese, or whoever, can > be as disciplined, or as tighly controlled, as the English-speaking > part of the world. On the other hand, if you think that namespaces > will get meaningful because of people's nature, and you think it is > a bad thing that has to be avoided, there is no other choice but to > limit ourselves to the decimal digits and maybe two or three other > characters. We can discourage putting human-friendly names in URNs by the methods I mentioned above. Also, since URI resolution will work for either URNs or URLs, there will be no reason to insist that a name be in URN space just so it can have resolution. So the only people who will assign URNs will be those who really want long-term persistent and ugly resource names. (At least this is what I hope will happen.) > But there can be a great deal of human-friendliness, > without any need for ambiguity. URLs such as http://www.ibm.com > or http://www.icrc.org are examples. My brother, who works at ICRC in > Geneva, recently called me. Before he was able to tell me how I could > reach him by email, I had the ICRC home page on my browser, and had > found the generic email address explanations, and had sent him a mail. There's probably more than one organization in the world named ICRC. The one your brother works for got ICRC.COM first, but the name is still ambiguous. You could have as easily found the a different ICRC than the one you were looking for. (I keep getting tripped up when I visit the web page of Global Business Network -- which is www.gbn.ORG. www.GBN.COM is somebody else.) This is only going to get worse as more companies get on the Internet. > If it can be that human-friendly, and that unambiguous, I don't know why > I should go to a search service to find the ICRC. And I don't know > why people that use other scripts should go though a serach service, > while English speaking users don't have to do so. All names are unambiguous only within a limited context. When the net was small, the context was inherently limited, so it didn't cause a problem to have only one host for any given name. But now that the net is much larger, it is a big problem. The whole huge mess over ownership of domain names, the InterNIC's capriciousness in deciding who gets what domain within .COM, and all of the hair-brained proposals to add tons of vanity-plate top-level domains -- ALL of this is caused by the lack of support for human-friendly names in the Internet infrastructure, and the resulting misuse of DNS to do a job it wasn't designed to do. We're not going to misuse URNs in the same way. > > >> And I don't want to have to search through long lists > >> of possible answers just because I use Japanese, whereas > >> I will get one immediate answer for English. > > > >The point is that you will often get multiple answers for English. > >English names are no more precise than names in other languages. > > No, but English can be used in URLs, and I get an unambigous > result if there is one, in due time. In Japanese, even if the > result would be unambiguous, I currently can't get it fast and > easily. Which is better - an ambiguous result which is usually correct, or an unambiguous result which is often incorrect? The number of names you get back from a search of a user-friendly name is related to how ambiguous the name is. If the name you give is relatively unambiguous, you'll only get a few names back. "Kodak" (a name explicitly chosen to be globally unique) will give you relatively few hits, while "Acme" (a very common brand name) will give you far more. Part of this is because you'll already be restricting the context of the search -- if you're looking for the name of a business organization, you'll go to a service that matches user-friendly names against names of businesses. If you're looking for a person's name, you'll use a different service. Keith